-
Couldn't load subscription status.
- Fork 1.5k
BUG: honor XObject /ColorSpace hint when deriving image mode #3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Preserve hinted color spaces when deriving XObject image modes
|
Thanks for the PR. Could you please elaborate on the reasons for this change? Additionally, please have a look at the CI failures for us to continue. |
|
What changed & why _get_mode_and_invert_color now treats the XObject’s /ColorSpace entry as an authoritative hint for the final Pillow mode and propagates that hint through the existing _get_imagemode(...) logic: We derive a preferred_mode from IA.COLOR_SPACE: /DeviceRGB → "RGB", /DeviceGray → "L", /DeviceCMYK → "CMYK". We pass preferred_mode as prev_mode to _get_imagemode(...) for both the <8 bpc path and the general path. If _get_imagemode(...) collapses to "" (e.g., NullObject, ambiguous Indexed/Separation/DeviceN chains), we fall back to the hint and set invert_color=True when the hint is CMYK—matching existing behavior where CMYK implies inversion. Problem this solves Some PDFs declare a reliable /ColorSpace on the image XObject, but the resolved color space passed to _get_imagemode(...) can be ambiguous or collapse (e.g., NullObject, ICCBased without usable Alternate, certain DeviceN/Indexed cases). Previously we only special-cased /DeviceRGB, so the function could return "" (unknown) and lose the author intent—leading to wrong modes (e.g., grayscale treated as RGB) or missed CMYK inversion. Compatibility If _get_imagemode(...) returns a concrete mode, nothing changes. The hint is only used as a tie-breaker / fallback and to seed prev_mode, so it doesn’t override explicit resolvable color spaces. |
|
My above requests still hold through. Where do we have such an example? Could we please have a proper test for this? (I do not think that the possibly AI-generated initial and previous comment explain this issue in a suitable way to make it clear to me.) |
Summary
Honor the XObject /ColorSpace hint when deriving Pillow image modes in _get_mode_and_invert_color, especially when nested/indirect color spaces collapse or are ambiguous. This prevents mode mis-detection and preserves CMYK inversion semantics.
Motivation / Problem
In real-world PDFs the image XObject often carries a correct /ColorSpace, while the resolved color space provided to _get_imagemode(...) may be:
NullObject (e.g., after following references),
ICCBased without a usable /Alternate,
Indexed / Separation / DeviceN chains that don’t yield a direct Device* at the point of resolution.
Previously we:
Special-cased only /DeviceRGB, ignoring /DeviceGray and /DeviceCMYK.
Didn’t pass any prior/hinted mode into _get_imagemode(...).
Had no fallback if _get_imagemode(...) returned "".
This could yield the wrong final mode and miss CMYK inversion.
Approach
Compute preferred_mode from IA.COLOR_SPACE:
/DeviceRGB → "RGB", /DeviceGray → "L", /DeviceCMYK → "CMYK".
Pass preferred_mode to _get_imagemode(...) as prev_mode for both:
the <8 bpc branch and
the general branch (including the “2 components if not explicitly gray” rule).
If the derived mode is still "", fall back to preferred_mode.
If the fallback is CMYK, set invert_color=True (consistent with current CMYK handling elsewhere).
Effects / Compatibility
No change when _get_imagemode(...) already returns a concrete mode.
Only ambiguous/degenerate cases change—and they now align with the XObject’s declared /ColorSpace.
Keeps existing decode and flate/JPX handling untouched.
Tests
Added tests asserting that, when color_space collapses (e.g., NullObject), we still:
choose "RGB" for a /DeviceRGB hint (no invert),
choose "L" for /DeviceGray (no invert),
choose "CMYK" for /DeviceCMYK and set invert_color=True.